No Unbiased Estimator of the Variance of K-Fold Cross-Validation

نویسندگان

  • Yoshua Bengio
  • Yves Grandvalet
چکیده

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation estimator. The main theorem shows that there exists no universal unbiased estimator of the variance of K-fold cross-validation. An analysis based on the eigendecomposition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive estimators may grossly underestimate variance, as con£rmed by numerical experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Computation of Unconditional Error Rate Estimators for Learning Algorithms and an Application to a Biomedical Data Set Master Thesis

We derive an unbiased variance estimator for re-sampling procedures using the fact that those procedures are incomplete U-statistics. Our approach is based on careful examination of the combinatorics governing the covariances between re-sampling iterations. We establish such an unbiased variance estimator for the special case of K-Fold cross-validation. This estimator exists as soon as new obse...

متن کامل

Minimization and estimation of the variance of prediction errors for cross-validation designs

We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and...

متن کامل

The Structure of Bhattacharyya Matrix in Natural Exponential Family and Its Role in Approximating the Variance of a Statistics

In most situations the best estimator of a function of the parameter exists, but sometimes it has a complex form and we cannot compute its variance explicitly. Therefore, a lower bound for the variance of an estimator is one of the fundamentals in the estimation theory, because it gives us an idea about the accuracy of an estimator. It is well-known in statistical inference that the Cram&eac...

متن کامل

Long-term Streamflow Forecasting by Adaptive Neuro-Fuzzy Inference System Using K-fold Cross-validation: (Case Study: Taleghan Basin, Iran)

Streamflow forecasting has an important role in water resource management (e.g. flood control, drought management, reservoir design, etc.). In this paper, the application of Adaptive Neuro Fuzzy Inference System (ANFIS) is used for long-term streamflow forecasting (monthly, seasonal) and moreover, cross-validation method (K-fold) is investigated to evaluate test-training data in the model.Then,...

متن کامل

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

We investigate the accuracy of the two most common estimators for the maximum expected value of a general set of random variables: a generalization of the maximum sample average, and cross validation. No unbiased estimator exists and we show that it is non-trivial to select a good estimator without knowledge about the distributions of the random variables. We investigate and bound the bias and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2003